Link to shiny app: https://michaeldgarber.shinyapps.io/wf-emm/

1 Introduction

1.1 Background: Do et al. (GeoHealth, 2024)

Do and colleagues have recently estimated spatially varying effects of short-term exposure to wildfire PM2.5 on acute-care hospitalizations (Do et al. 2024) in California. They defined the exposure as a day with wildfire smoke PM2.5 ≥15 μg/m3. The outcome was the number of respiratory emergency department visits and unplanned hospitalizations.

They used a case-crossover design to estimate effects at the level of the zip-code tabulation area (ZCTA) throughout 1,396 ZCTAs in California from 2006-2019. A map of these spatially varying effect estimates, expressed as a rate difference per 100,000 person-days, appears below (adapted from their figure 5).

A positive rate difference (shades of green) indicates that wildfire had an estimated harmful effect (i.e., more) hospitalizations, while a negative value (purple shades) implies that wildfire led to fewer hospitalizations.

In addition, they explored community characteristics possibly driving this spatial heterogeneity. The authors state:

We used meta‐regression to evaluate potential effect modification by community characteristics on the effect of a wildfire smoke day on acute care utilization at the ZCTA level. For each community characteristic, which was selected a priori, we ran a meta‐regression of the pooled ZCTA‐specific rate difference on the community characteristic. To preserve statistical power, we excluded 100 ZCTAs without complete data for 14 community characteristics other than A/C prevalence, and we excluded 274 ZCTAs for meta‐regression of the A/C prevalence. Our estimates are reported as rate difference per interquartile range increase of the community characteristic.

Below, adapted from their figure 6, is a plot of the increase in rate difference per interquartile range (IQR) increase for each socio-demographic community characteristic they considered:

This analysis of effect modification describes how the estimated effect of wildfire smoke on acute-care utilization varies based on variation in the community characteristics. Results show, for example, that the wildfire-utilization effect was stronger in communities with lower levels of air conditioning and with higher levels of uninsured populations.

Interpreting a specific value from the figure, an increase in A/C proportion from the 25th percentile to the 75th percentile was associated with a 0.239 (95% confidence interval: 0.0672, 0.411) lower rate rate difference in the effect of wildfire smoke on same-day respiratory acute-care utilization.

1.2 Distinction between effect modification vs interaction

It is important to note that just because wildfire’s effects on acute-care utilization vary across these values, it does not necessarilly mean that intervenining to change these variables would necessarily change the wildfire-healthcare-utilization effect.

Tyler VanderWeele describes the important distinction between effect modification and interaction on p. 268 of his book, Explanation in Causal Inference (bold added for emphasis here):

If we found that the effect of our primary exposure varied by strata defined by the secondary factor in this way, then we might call  this “effect heterogeneity” or “effect modification.” This might be useful, for example, in decisions about which subpopulations to target in order to maximize the effect of interventions. Provided that we have controlled for confounding of relationship between the primary exposure and the outcome, these estimates of effect modification or effect heterogeneity could be useful even if we have not controlled for confounding of the relationship between the secondary factor and the outcome. (Tyler J. VanderWeele 2015)

What we would not know, however, is whether the effect heterogeneity were due to the secondary factor itself, or something else associated with it. If we have not controlled for confounding for the secondary factor, the secondary factor itself may simply be serving as a proxy for something that is causally relevant for the outcome (Tyler J. VanderWeele and Robins 2007).

VanderWeele continues (bold again added for its relevance here):

If we are interested principally in assessing the effect of the primary exposure within subgroups defined by a secondary factor then simply controlling for confounding for the relationship between the primary exposure and the outcome is sufficient. However, if we want to intervene on the secondary factor in order to change the effect of the primary exposure then we need to control for confounding of the relationships of both factors with the outcome. When we control for confounding for both factors we might refer to this as “causal interaction” in distinction from mere “effect heterogeneity” mentioned above (T. J. VanderWeele 2009).

That is, effect modification can be considered a description of variation in an effect of a primary exposure over values over a secondary variable, whereas assessment of interaction requires the secondary factor to be considered as a causal factor.

To inform policy, it would be useful assess interaction to know what environmental factors could be intervened upon to mitigate wildfire’s health effects. Both tree canopy and air conditioning could plausibly attenuate wildfire’s health effects and in that way, interact with wildfire on the outcome. In the case of tree canopy, previous research has shown that green space can reduce air pollution.(Diener and Mudu 2021) And previous research in California has suggested that air conditioning use could mitigate the health effects of wildfires.(Stowell et al. 2025)

In addition, they are socio-environmental pathways that are relatively amenable to intervention. It would be a realistic policy option to encourage more tree planting, for example, or institute policies to encourage more air conditoning use during wildfire events.

However, as both of these factors may be associated with other factors on the causal pathway from wildfire to health, it would be important to control for potential confounders.

1.3 The need for locally developed health-impact assessment

Furthermore, many health-impact assessment studies make a strong transportability assumption–using dose-response functions developed elsewhere and applying it to the local context. There is a need for locally tailored health-impact assessment studies.

1.4 Analysis goals

The primary goal of this analysis is to assess how the distribution of the effect of wildfire onhealthcare utilization effect could change were there interventions changing environmental factors that influence the effect. We do so estimating the effect of tree canopy and air conditioning on the spatially varying rate difference.

A secondary goal, related to training and skill development, is to explore the utility of R Shiny tools for presenting high-dimensional results with which a user can interact. For example, a user could choose to raise the distribution of the effect modifier to the 50th percentile and see how results might change.)

2 Methods

The overall approach is as follows:

  1. Estimate the effect of both tree canopy and air conditioning (separately) on the spatially varying rate difference, controlling for potential confounders that may influence these associations such as urban vs rural status, biome, and socioeconomic characteristics. In these models, we will also consider interaction variables, allowing the relationship between these main exposures (tree canopy and AC) and the rate difference to vary across strata of a third variable.
  2. Use these models to simulate the overall burden and spatial distribution of healthcare utilization attributable to wildfire that could be averted by intervening upon these variables.that

2.1 Data exploration

To inform modeling, we explored the distribution of all variables as well as bivariate associations between tree canopy, air conditioning, the rate difference, and other covariates.

2.1.1 Univariate distributions

2.1.1.1 Outcome: Rate difference per 100,000 person-days

## # A tibble: 4 × 3
##   ruca_cat rd_100k_pt_med rd_100k_pt_mean_wt
##   <fct>             <dbl>              <dbl>
## 1 (0,3]           -0.0331              0.196
## 2 (3,6]           -0.0351             -0.164
## 3 (6,9]           -0.522               0.141
## 4 (9,10]          -0.732              -0.306

2.1.1.2 Intervention variables: Tree canopy and air conditioning

2.1.1.3 Proportion above poverty and proportion insured

2.1.1.4 Biome and rural-urban classification

Definitions of rural-urban commuting codes:

  1. Metropolitan area core: primary flow within an urbanized area (UA)
  2. Metropolitan area high commuting: primary flow 30% or more to a UA
  3. Metropolitan area low commuting: primary flow 10% to 30% to a UA
  4. Micropolitan area core: primary flow within an Urban Cluster of 10,000 to 49,999 (large UC)
  5. Micropolitan high commuting: primary flow 30% or more to a large UC
  6. Micropolitan low commuting: primary flow 10% to 30% to a large UC
  7. Small town core: primary flow within an Urban Cluster of 2,500 to 9,999 (small UC)
  8. Small town high commuting: primary flow 30% or more to a small UC
  9. Small town low commuting: primary flow 10% to 30% to a small UC
  10. Rural areas: primary flow to a tract outside a UA or UC

2.1.2 Bivariate and stratified associations

2.1.2.1 RD x Tree canopy x Covariates

Here are scatterplots plotting the rate difference against the (square root of) proportion tree canopy:

There is a slight negative association between tree canopy and the RD, but it is rather weak.

We also stratified this association by urban-rural category:

The stratified scatterplot suggests that in the most urban areas and in the least urban areas, higher tree canopy is associated with a lower rate difference, whereas in the other two RUCA categories, the association is in the other direction: higher tree canopy is associated with a higher rate difference (i.e., suggesting a more harmful effect of wildfire on the outcome).

2.1.2.2 RD x Proportion A/C x Covariates

The association with air conditioning is weakly negative.

Like with tree canopy, the RD x A/C association varies by rural-urban category. It is more negative in more populous areas, and positive–suggesting air conditioning is associated with a higher difference effect of wildfire on hospitalizations–in less populous areas.

2.1.2.3 RD x proportion above poverty

2.1.2.4 RD x proportion insured

The association between the RD and proportion with insurance is stronger than that between proportion above poverty.

2.1.2.5 RD x biome

The RD distribution is in the positive direction in Mediterranean Forests, Woodlands & Scrub and more negative in all the others.

biome_name_freq rd_100k_pt_med rd_100k_pt_mean_wt
Mediterranean Forests, Woodlands & Scrub 0.154 0.341
Temperate Conifer Forests -0.282 0.043
Temperate Grasslands, Savannas & Shrublands -0.498 -0.444
Deserts & Xeric Shrublands -0.756 -0.534

2.1.2.6 RD x urban-rural category

The distribution is more negative in more rural areas, suggesting in those areas, wildfires had a preventive effect on healthcare utilization

ruca_cat rd_100k_pt_med rd_100k_pt_mean_wt
(0,3] -0.033 0.196
(3,6] -0.035 -0.164
(6,9] -0.522 0.141
(9,10] -0.732 -0.306

2.2 Modeling

Considering these empirical trends in the data along with substantive considerations of the plausible causal pathway, we decided upon the following models:

2.2.1 Tree canopy

Control variables:

  • proportion insured

  • rural-urban category

  • biome

Interaction: by rural-urban category

glm_rd_e_tree_canopy_prop_sqrt_conf_ins_biome_int_ruca =glm(
  rd_quo_pt~tree_canopy_prop_sqrt +insured_prop + ruca_cat +
    biome_name_freq+
    tree_canopy_prop_sqrt*ruca_cat,
  family = gaussian,
  na.action = "na.exclude", 
  weights = rd_quo_weight,
  data = wf_eff_emm_wide_no_outliers
)

summary(glm_rd_e_tree_canopy_prop_sqrt_conf_ins_biome_int_ruca)
## 
## Call:
## glm(formula = rd_quo_pt ~ tree_canopy_prop_sqrt + insured_prop + 
##     ruca_cat + biome_name_freq + tree_canopy_prop_sqrt * ruca_cat, 
##     family = gaussian, data = wf_eff_emm_wide_no_outliers, weights = rd_quo_weight, 
##     na.action = "na.exclude")
## 
## Coefficients:
##                                                                Estimate
## (Intercept)                                                 0.000014172
## tree_canopy_prop_sqrt                                      -0.000014162
## insured_prop                                               -0.000008866
## ruca_cat(3,6]                                              -0.000005143
## ruca_cat(6,9]                                              -0.000008299
## ruca_cat(9,10]                                              0.000011480
## biome_name_freqTemperate Conifer Forests                   -0.000002411
## biome_name_freqTemperate Grasslands, Savannas & Shrublands -0.000007674
## biome_name_freqDeserts & Xeric Shrublands                  -0.000002769
## tree_canopy_prop_sqrt:ruca_cat(3,6]                         0.000025108
## tree_canopy_prop_sqrt:ruca_cat(6,9]                         0.000017809
## tree_canopy_prop_sqrt:ruca_cat(9,10]                       -0.000018422
##                                                              Std. Error t value
## (Intercept)                                                 0.000007040   2.013
## tree_canopy_prop_sqrt                                       0.000004578  -3.094
## insured_prop                                                0.000008133  -1.090
## ruca_cat(3,6]                                               0.000003369  -1.526
## ruca_cat(6,9]                                               0.000004751  -1.747
## ruca_cat(9,10]                                              0.000004896   2.345
## biome_name_freqTemperate Conifer Forests                    0.000002161  -1.116
## biome_name_freqTemperate Grasslands, Savannas & Shrublands  0.000001168  -6.570
## biome_name_freqDeserts & Xeric Shrublands                   0.000002186  -1.267
## tree_canopy_prop_sqrt:ruca_cat(3,6]                         0.000008155   3.079
## tree_canopy_prop_sqrt:ruca_cat(6,9]                         0.000012909   1.380
## tree_canopy_prop_sqrt:ruca_cat(9,10]                        0.000009972  -1.847
##                                                                   Pr(>|t|)    
## (Intercept)                                                        0.04431 *  
## tree_canopy_prop_sqrt                                              0.00202 ** 
## insured_prop                                                       0.27590    
## ruca_cat(3,6]                                                      0.12720    
## ruca_cat(6,9]                                                      0.08092 .  
## ruca_cat(9,10]                                                     0.01919 *  
## biome_name_freqTemperate Conifer Forests                           0.26470    
## biome_name_freqTemperate Grasslands, Savannas & Shrublands 0.0000000000737 ***
## biome_name_freqDeserts & Xeric Shrublands                          0.20545    
## tree_canopy_prop_sqrt:ruca_cat(3,6]                                0.00212 ** 
## tree_canopy_prop_sqrt:ruca_cat(6,9]                                0.16795    
## tree_canopy_prop_sqrt:ruca_cat(9,10]                               0.06493 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0000000003597093)
## 
##     Null deviance: 0.00000048367  on 1269  degrees of freedom
## Residual deviance: 0.00000045251  on 1258  degrees of freedom
## AIC: -24519
## 
## Number of Fisher Scoring iterations: 2

2.2.2 Air conditioning

Control variables:

  • proportion insured

  • rural-urban category

  • biome

Interaction: by rural-urban category

glm_rd_e_ac_prop_conf_ins_biome_int_ruca =glm(
  rd_quo_pt~ac_prop +insured_prop + ruca_cat + biome_name_freq+
    ac_prop*ruca_cat,
  family = gaussian,
  na.action = "na.exclude", 
  weights = rd_quo_weight,
  data = wf_eff_emm_wide_no_outliers
)
summary(glm_rd_e_ac_prop_conf_ins_biome_int_ruca)
## 
## Call:
## glm(formula = rd_quo_pt ~ ac_prop + insured_prop + ruca_cat + 
##     biome_name_freq + ac_prop * ruca_cat, family = gaussian, 
##     data = wf_eff_emm_wide_no_outliers, weights = rd_quo_weight, 
##     na.action = "na.exclude")
## 
## Coefficients:
##                                                                Estimate
## (Intercept)                                                 0.000023207
## ac_prop                                                    -0.000001573
## insured_prop                                               -0.000022192
## ruca_cat(3,6]                                               0.000005056
## ruca_cat(6,9]                                              -0.000005789
## ruca_cat(9,10]                                             -0.000017724
## biome_name_freqTemperate Conifer Forests                   -0.000003888
## biome_name_freqTemperate Grasslands, Savannas & Shrublands -0.000008270
## biome_name_freqDeserts & Xeric Shrublands                   0.000004871
## ac_prop:ruca_cat(3,6]                                      -0.000004942
## ac_prop:ruca_cat(6,9]                                       0.000002527
## ac_prop:ruca_cat(9,10]                                      0.000030892
##                                                              Std. Error t value
## (Intercept)                                                 0.000007569   3.066
## ac_prop                                                     0.000001651  -0.953
## insured_prop                                                0.000008421  -2.635
## ruca_cat(3,6]                                               0.000003357   1.506
## ruca_cat(6,9]                                               0.000005838  -0.992
## ruca_cat(9,10]                                              0.000004412  -4.017
## biome_name_freqTemperate Conifer Forests                    0.000002153  -1.806
## biome_name_freqTemperate Grasslands, Savannas & Shrublands  0.000001450  -5.705
## biome_name_freqDeserts & Xeric Shrublands                   0.000002363   2.061
## ac_prop:ruca_cat(3,6]                                       0.000005237  -0.944
## ac_prop:ruca_cat(6,9]                                       0.000007973   0.317
## ac_prop:ruca_cat(9,10]                                      0.000006510   4.745
##                                                               Pr(>|t|)    
## (Intercept)                                                    0.00222 ** 
## ac_prop                                                        0.34087    
## insured_prop                                                   0.00853 ** 
## ruca_cat(3,6]                                                  0.13230    
## ruca_cat(6,9]                                                  0.32165    
## ruca_cat(9,10]                                             0.000062963 ***
## biome_name_freqTemperate Conifer Forests                       0.07122 .  
## biome_name_freqTemperate Grasslands, Savannas & Shrublands 0.000000015 ***
## biome_name_freqDeserts & Xeric Shrublands                      0.03950 *  
## ac_prop:ruca_cat(3,6]                                          0.34553    
## ac_prop:ruca_cat(6,9]                                          0.75138    
## ac_prop:ruca_cat(9,10]                                     0.000002357 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0000000003572641)
## 
##     Null deviance: 0.00000042252  on 1103  degrees of freedom
## Residual deviance: 0.00000039013  on 1092  degrees of freedom
##   (166 observations deleted due to missingness)
## AIC: -21316
## 
## Number of Fisher Scoring iterations: 2

2.2.3 Other modeling notes

  • Considering the outcome in the model is itself a modeled result (with a point estimate and a standard error), we use meta-regression to weight observations (ZCTAs) with more precise estimates more highly. Specifically, we weight ZCTAs in each model proportional to the inverse of their standard error.

  • We used simple linear regression because the outcome (the rate difference) is normally distributed. Unlike a rate, for example, it is not bounded by zero, so Poisson regression is not needed.

2.3 Predicting values over several scenarios

2.3.1 Scenario definitions

We then used the model coefficients to predict a new rate difference between wildfire and acute-care utilization over various scenarios. For both tree canopy and proportion AC, we found quintile values (i.e., 20th percentile, 40th percentile, 60th percentile). We then developed scenarios where we raised the values of all ZCTAs below that quintile value to that quintile’s value. And we plugged in the alternative value into the model to predict the new RD.

For tree canopy, we found quintiles relative to biome because tree canopy is strongly associated with biome, and it may not be realistic to raise tree canopy in the desert up to the 80th percentile of a temperate forest.

Air conditioning also varies considerably by biome, but presumably it is feasible to raise AC regardless of biome, so we not define scenarios relative to biome for AC.

2.3.2 Calculating the difference between the two rate differences

For both tree canopy and A/C, we calculated the difference between the alternative rate difference and the baseline (status-quo) rate difference.

3 Results

Tables summarizing results 1) over all ZCTAs and 2) by RUCA are below.

Target percentile refers to the percentile of the effect modifier to which all ZCTAs with values below that percentile value would be raised under the target scenario. For example, if the target scenario corresponds to the 60th percentile of air conditioning, all ZCTAs below the 60th percentile would have their A/C raised to the 60th percentile.

To-dos:

  • Rename variables in the table & define them

  • incorporate uncertainty by bootstrapping

  • add results for air conditioning (will be lower in magnitude)

  • map results

  • interactive dashboard with maps and results by scenario is here (still need to update): https://michaeldgarber.shinyapps.io/wf-emm/

target_percentile n_zcta n_cases_diff_alt_pt n_cases_diff_quo_pt n_cases_diff_quo_pred_pt rd_alt_pt_med rd_alt_pt_mean_wt pop rd_alt_pt rd_quo_pt rd_quo_pred_pt rd_diff_act_pt rd_diff_pred_pt n_cases_diff_diff_act_pt n_cases_diff_diff_pred_pt
0.2 255 24.933637 35.86380 27.33513 0.0000033 0.0000027 9186814 0.0000027 0.0000039 0.0000030 -0.0000012 -0.0000003 -10.93017 -2.401491
0.4 508 39.793469 72.52769 47.77917 0.0000029 0.0000022 17991254 0.0000022 0.0000040 0.0000027 -0.0000018 -0.0000004 -32.73422 -7.985704
0.6 762 40.519981 87.12975 59.44289 0.0000023 0.0000015 26453537 0.0000015 0.0000033 0.0000022 -0.0000018 -0.0000007 -46.60977 -18.922910
0.8 1014 9.493449 76.58264 63.17616 0.0000009 0.0000003 33930239 0.0000003 0.0000023 0.0000019 -0.0000020 -0.0000016 -67.08919 -53.682711
1.0 1266 -187.375402 74.68940 54.49182 -0.0000050 -0.0000049 38583719 -0.0000049 0.0000019 0.0000014 -0.0000068 -0.0000063 -262.06480 -241.867221

References

Diener, Arnt, and Pierpaolo Mudu. 2021. “How Can Vegetation Protect Us from Air Pollution? A Critical Review on Green Spaces’ Mitigation Abilities for Air-Borne Particles from a Public Health Perspective - with Implications for Urban Planning.” Science of The Total Environment 796 (November): 148605. https://doi.org/10.1016/j.scitotenv.2021.148605.
Do, V., C. Chen, T. Benmarhnia, and J. A. Casey. 2024. “Spatial Heterogeneity of the Respiratory Health Impacts of Wildfire Smoke PM 2.5 in California.” GeoHealth 8 (4): e2023GH000997. https://doi.org/10.1029/2023GH000997.
Stowell, Jennifer D, Ian Sue Wing, Yasmin Romitti, Patrick L Kinney, and Gregory A Wellenius. 2025. “Emergency Department Visits in California Associated with Wildfire PM2.5 : Differing Risk Across Individuals and Communities.” Environmental Research: Health 3 (1): 015002. https://doi.org/10.1088/2752-5309/ad976d.
VanderWeele, T J. 2009. “On the Distinction Between Interaction and Effect Modification.” Epidemiology (Cambridge, Mass.) 20 (6): 863.
VanderWeele, Tyler J. 2015. “An Introduction to Interaction Analysis.” In, 249–85. New York, NY: Oxford University Press.
VanderWeele, Tyler J, and James M Robins. 2007. “Four Types of Effect Modification: A Classification Based on Directed Acyclic Graphs.” Epidemiology 18 (5): 561–68. https://doi.org/10.1097/EDE.0b013e318127181b.